Rank | Count | Beginning |
---|---|---|
19016 | 2930 | O |
176 | 2477 | A |
9260 | 887 | É |
21834 | 759 | Os |
18071 | 556 | No |
23249 | 541 | Para |
9960 | 539 | Em |
4148 | 505 | As |
16726 | 396 | Na |
24504 | 396 | Por |
6373 | 351 | Com |
15867 | 338 | Mas |
16944 | 332 | Não |
7792 | 293 | De |
26795 | 250 | Se |
11650 | 243 | Está |
29011 | 220 | Um |
25430 | 200 | Quando |
6630 | 194 | Como |
12065 | 193 | Este |
14527 | 191 | Já |
26921 | 178 | Segundo |
28992 | 172 | Uma |
8188 | 162 | Depois |
3019 | 136 | Ao |
13796 | 129 | Há |
15582 | 129 | Mais |
17897 | 124 | NET |
2080 | 121 | Além |
13216 | 115 | Foi |
In the next four subsections show the most frequent sentence beginnings consisting of N words, N=1, 2, 3, 4. In this subsection we start with N=1.
The most frequent word-N-grams at the beginning of sentences give some insight into sentence composition.
Especially for N=1, we only need a small corpus to identify the most frequent sentence beginnings.
select substring_index(sentence, ' ', 1) as beg, count(*) as cnt from sentences group by substring_index(sentence, ' ', 1) order by cnt desc limit 50;
4.3.1.2 Most Frequent Sentence Beginnings II
4.3.1.3 Most Frequent Sentence Beginnings III
4.3.1.4 Most Frequent Sentence Beginnings IV
4.3.1.1 Most Frequent Sentence Endings I
4.3.1.2 Most Frequent Sentence Endings II
4.3.1.3 Most Frequent Sentence Endings III
4.3.1.4 Most Frequent Sentence Endings IV